Face Synthesis Driven by Audio Speech Input Based on Hmms
نویسندگان
چکیده
In this paper, a HMM-based visual speech system driven by audio speech input is designed to render a face model while synchronous audio is played. Compared to many methods adopted by other researchers, there is much difference between our approach and theirs. We first train the models for every final and initial in mandarin. In this process, a large quantity of audio training data under different surroundings and spoken by different people are used. Then, the recorded synchronous audiovisual speech data are used to make the trained models more adaptive to our specific announcer. Such models are more robust in synthesis phase and satisfying performance can be achieved even when input audio speech is degraded by noises.
منابع مشابه
Visual Speech Synthesis Based on Parameter Generation From HMM: Speech-Driven and Text-And-Speech-Driven Approaches
This paper describes a technique for synthesizing synchronized lip movements from auditory input speech signal. The technique is based on an algorithm for parameter generation from HMM with dynamic features, which has been successfully applied to text-to-speech synthesis. Audio-visual speech unit HMMs, namely, syllable HMMs are trained with parameter vector sequences that represent both auditor...
متن کاملSynthetic visual speech driven from auditory speech
We have developed two different methods for using auditory, telephone speech to drive the movements of a synthetic face. In the first method, Hidden Markov Models (HMMs) were trained on a phonetically transcribed telephone speech database. The output of the HMMs was then fed into a rulebased visual speech synthesizer as a string of phonemes together with time labels. In the second method, Artif...
متن کاملTwo- and Three-Dimensional Audio-Visual Speech Synthesis
An audio-visual speech synthesiser has been built that will generate animated computer-graphics displays of high-resolution, colour images of a speaker's mouth area. The visual displays can simulate the movements of the lower face of a talker for any spoken sentence of British English, given a text input. The synthesiser is based on a data-driven technique. It uses encoded, video-recorded image...
متن کاملSpeaker-independent 3D face synthesis driven by speech and text
In this study, a complete system that generates visual speech by synthesizing 3D face points has been implemented. The estimated face points drive MPEG-4 facial animation. This system is speaker independent and can be driven by audio or both audio and text. The synthesis of visual speech was realized by a codebook-based technique, which is trained with audio-visual data from a speaker. An audio...
متن کاملSpeech-to-lip movement synthesis based on the EM algorithm using audio-visual HMMs
This paper proposes a method to re-estimate output visual parameters for speech-to-lip movement synthesis using audio-visual Hidden Markov Models(HMMs) under the Expectation-Maximization(EM) algorithm. In the conventional methods for speech-to-lip movement synthesis, there is a synthesis method estimating a visual parameter sequence through the Viterbi alignment of an input acoustic speech sign...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002